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ABSTRACT 

The  Lydia  Pinkham  data  used  by  Palda  and  by  Clarke  and  McCann 
to  evaluate  the  lagged  effects  of  advertising  are  reanalyzed  using 
a  Box- Jenkins  transfer  function  analysis.   After  a  general  overview 
of  the  technique/  the  steps  in  the  analysis  are  described  and  the 
empirical  results  at  each  stage  reported.  The  need  for  proper 
pre-whitening  of  the  advertising  series  is  stressed.   Final  results 
indicate  that  there  are  no  substantial  effects  of  advertising 
beyond  the  first  year  —  confirming  Clarke  and  McCann' s  cross- 
spectral  analysis. 


I.   INTRODUCTION 

Any  decision  maker  needs  to  know  the  effectiveness  of  each 
course  of  action  he  is  considering.   This  is  a  basic  step  in  the 
scientific  method  of  making  decisions.   In  the  absence  of  a  well 
developed  theory,  the  decision  maker  is  forced  to  either  experiment 
or  examine  past  experience  with  the  help  of  statistical  analysis. 
A  somewhat  novel  type  of  such  empirical  procedures  has  been  developed 
by  two  applied  statisticians,  George  E.  p.  Box  of  the  University 
of  Wisconsin  and  Gwilym  M.  Jenkins  of  the  University  of  Lancaster, 
U.  K.   Their  "Box- Jenkins  transfer  function  analysis"  is  applicable 
to  some  of  the  more  common  problems  that  advertisers  and  researchers 
alike  have  attempted  to  answer.   For  example,  are  there  only  current 
effects  of  advertising  or  lagged  effects  also?   If  there  is  dynamic 
response  is  it  short  lived  or  continuing?   Is  the  greatest  effect 
of  advertising  immediate  or  delayed? 

One  of  the  first  and  best  known  investigations  into  lagged 
effects  of  advertising  is  Palda's  L/].   His  analysis  of  the  Lydia 
Pinkham  data  showed  that  the  response  was  dynamic,  long  lived,  and 
had  the  greatest  effect  immediately. 

More  recently  Clarke  and  McCann  have  reanalyzed  the  Lydia 
Pinkham  data  by  Frequency  Domain  Analysis,  a  type  of  cross  spectral 
analysis  [3J .   They  concluded  that  no  advertising  effects  were 
significant  for  periods  longer  than  one  year  when  using  annual 
data  and  that  the  maximum  effect  occurred  during  the  second  month 
after  advertising. 
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For  this  paper  we  have  reanalyzed  the  Lydia  Pinkham  data 
in  the  time  domain  using  Box-Jenkins  procedures  to  see  if  addi- 
tional insights  into  the  lagged  effects  of  advertising  can  be 
generated.   In  addition,  since  the  Box- Jenkins  approach  is  rela- 
tively new  it  might  be  of  interest  to  see  it  applied  to  well  known 
and  readily  available  data. 

The  next  section  of  this  paper  explains  in  greater  detail  the 
family  of  advertising  effectiveness  models  that  are  considered 
in  a  Box- Jenkins  transfer  function  analysis.   Then  a  brief  overview 
of  the  analytical  stages  in  a  Box-Jenkins  approach  is  given.   After 
that/  the  results  of  each  stage  in  the  actual  analysis  of  the 
Pinkham  data  will  be  described  and  analyzed.   After  the  discussion 
of  the  final  stage  results  the  conclusions  with  respect  to  the 
dynamic  advertising  effects  are  summarized.   An  appendix  follows 
dealing  in  more  detail  with  the  statistics  employed  in  the  analysis, 

II.   THE  BOX- JENKINS  TRANSFER  FUNCTION  MODELS 

1.   The  General  Form 

Box  and  Jenkins  propose  a  rich  set  of  response  models  as  a 
family  of  transfer  functions  (also  called  impulse  response  func- 
tions).  In  their  most  general  form  the  set  of  models  can  be 
written  as  the  following  discrete  linear  process 

(1)   St=  vQAt  +  viAt-l  +  •••  +  Nt 

St   =  sales  at  time  t 
At  =  advertising  at  time  t 
Nt  sb  sum  of  effects  of  all  other  variables 
other  than  advertising. 


-  3  - 


As  a  matter  of  notational  convenience  we  employ  a  backshift 
operator  B  which  is  defined  as 

(2)   BZt  -  Zt-1  or   BmZt  -  Zt„m. 
We  can  therefore  rewrite  (1)  as 


(3)   St   =  (vQ  +  v±B   +  v2B2  +•••  )At  +  Nt 
=  v(B)At  +  Nt. 
The  polynomial  operator  v(B)  is  defined  as  the  transfer  function 
relating  sales  to  advertising  and  it  summarizes  the  dynamic  struc- 
ture of  the  effect  transferred  from  the  advertising  sequence  to  the 
sales  sequence.  A  restriction  on  the  V  s  is  that  if  advertising 
is  held  at  a  fixed  level  AQ,  then  sales  should  eventually  reach 
an  equilibrium  level  YQ  (a  stationarity  requirement) .   This  assump- 
tion often  necessitates  the  differencing  of  the  sales  and  adver- 
tising series  to  eliminate  trends  and  other  sources  of  non- station- 
arity. 

A  further  discussion  of  stationarity  and  differencing  is  given 
with  respect  to  the  Pinkham  data  in  the  pre-whitening  section 
presented  below.  Apart  from  this  restriction  imposed  by  stationarity, 
the  transfer  function  can  take  any  polynomial  form.  Thus,  a  vast 
many  alternative  lagged  effects  can  be  accommodated.   The  "Box- 
Jenkins"  analysis  basically  consists  of  procedures  for  assessing 
which  of  the  many  alternative  over-time  responses  is  in  fact  the 
correct  one.  Given  the  transfer  function  it  is  at  least  conceptually 
possible  to  select  Xt,...,Xt+a  so  as  to  achieve  any  desired  Yt,...# 
vt+a*  This  is  the  subject  of  much  of  control  theory.  The  substantial 
difficulties  in  formulating  and  deriving  the  optimal  control  solu- 


•  • 


. 
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tions  will  not  be  discussed  here  (the  interested  reader  is 
referred  to  Aoki  \X\ ) •   The  present  problem  is  a  less  normative 
one:   how  do  we  model  response  of  sales  to  advertising  irrespective 
of  the  control  situation  facing  us?1 

2.   The  Polynomial  v(B) 

In  the  Box- Jenkins  analysis  the  general  polynomial  v(B)  is 

represented  by  the  ratio  of  two  polynomials  of  small  degree  compared 

to  the  degree  of  v(B).   For  example,  if  there  were  small  response 

coefficients  up  to  lag  3,  after  which  there  was  a  geometric  decay 

in  the  coefficients  we  could  express  these  coefficients  in  the 

following  polynomial: 

B3  +<fB4   +cf2B5  +  ...   . 


l-<fB 


or  (4)   v(B)  =   £3 


i-(fB 

where  <f  is  the  decay  coefficient.   The  general  form  of  the  transfer 

function  is 

(5)   v(B)  =  ^W,   Bb 
<T(B) 

where  u>(B)  is  a  sth  order  polynomial  called  the  moving  average 
operator; 

cf(B)  is  a  rth  order  polynomial  operator  called  the  auto- 
regressive  operator; 

B*5  is  a  b*-*"1  order  dead  time  operator ; 


•*- Strictly  speaking,  the  situation  facing  the  decision  maker 
should  influence  choice  of  estimates  and  other  statistical  decisions 
(see  Marschak  C5Q  ) .   In  the  present  context  we  disregard  this 
complication,  however. 
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and  r,s,b  are  integers  greater  than  or  equal  to  zero. 
Thus,  the  lag  coefficients  are  specified  once  we  have  the  poly- 
nomial.  To  get  the  polynomial,  the  Box- Jenkins  analysis  first 
derives  the  appropriate  values  of  r,  s,  and  b  —  this  is  the 
"identification"  problem,  which  uses  cross-correlations  (see 
Appendix) .   Given  the  values  of  these  three  identifying  parameters, 
maximum  likelihood  estimates  are  then  derived  for  the  «o  and  0 
parameters  —  this  is  the  "estimation"  problem. 

3.   The  Noise  Process  N^. 

A  further  elaboration  of  the  transfer  function  models  accounts 
for  the  effect  of  situational  and  other  unspecified  factors  called 
"noise."   These  factors,  referred  to  as  Nt*  may  be  the  composite 
effect  of  unaccounted  for  random  shocks,  past  as  well  as  present. 
The  general  form  of  the  noise  is 


<5>  »t  =  -rfi}<i-B>de 


where  &   (B)  is  a  q^*1  order  movinj  average  operator; 
<j>   (B)  is  a  p1-*1  order  autoregressive  operator; 
Br   is  a  d*-"   order  difference  operator; 
£t  is  a  Normal  random  variable; 
and  where  p,  d,  and  q  are  integers  greater  than  or  equal  to  zero. 
As  for  the  transfer  function  above,  the  parameters  required 
to  specify  the  noise  process  are  derived  in  an  identification  and 
an  estimation  stage.   The  identification  relies  on  autocorrelations 
and  partial  autocorrelations  (see  Appendix)  to  specify  p,  d,  and  q. 
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and  then  the  maximum  likelihood  estimation  of  the  G   and  <f 
parameters  is  done  conditional  upon  the  assigned  p,  d,  and  q 
values. 

This  modeling  of  the  noise  process  entails  what  is  known  as 
Box- Jenkins  univariate  analysis.   This  analysis,  quite  apart  from 
its  role  in  transfer  function  analysis,  has  been  found  useful  in 
many  applications  (see,  for  example,  Nelson  £6]).   In  the  transfer 
function  analysis  it  is  also  used  for  the  first  stage  of  the 
analysis  (the  "pre-whitening"  stage;  see  below)  which  builds  a  model 
of  the  input  series  specified  by  the  parameters  p,  d,  q,  0   and  <j>    . 

III.   THE  TRANSFER  FUNCTION  MODEL  OF  THE  LYDIA  PINKHAM  DATA 

1.   A  Brief  Overview 

To  clarify  the  structure  of  this  part  of  the  paper,  a  brief 
overview  is  in  place.   Generally,  a  transfer  function  analysis 
involves  the  choice  of  three  models  or  processes.   First,  since  the 
input  series  (advertising  in  our  case)  can  be  seen  as  strictly 
exogenous  only  if  it  is  completely  random,  one  transforms  the  given 
input  data  so  as  to  achieve  such  randomness.   This  transformation 
involves  the  first  choice  of  model:   How  should  the  input  series  be 
randomized  or  "pre-whitened"?   The  bases  for  the  model  choice  are 
the  autocorrelation  and  partial  autocorrelation  functions,  and  the 
empirical  patterns  are  compared  to  alternative  theoretical  forms. 

Second,  after  the  output  series  (sales  in  our  case)  has  been 
transformed  similarly,  the  cross-correlation  function  between  the 
transformed  advertising  and  sales  series  is  used  to  arrive  at  a 


-  7  - 


best  representation  of  the  effect  of  advertising  upon  sales. 
Again,  empirical  and  theoretical  patterns  are  compared  for  the 
model  choice.   Third,  the  residuals  of  the  fit  are  looked  upon  as 
another  time  series,  and  a  new  univariate  analysis  (as  in  the  first 
pre-whitening  stage)  is  applied  to  derive  a  transformation  of  this 
"noise"  process  so  as  to  make  it  completely  random.   The  Box- 
Jenkins  transfer  function  analysis  consists  of  the  derivation  of 
a  pre-whitening  model,  an  impulse  response  function  (the  transfer 
function  proper) ,  and  a  noise  model. 

It  should  be  emphasised  that  the  approach  is  wholly  empirical. 
The  comparisons  between  various  correlation  patterns  are  based 
upon  what  a  model  with  a  particular  parameter  structure  generates 
in  terms  of  correlations,  not  upon  a  priori  theory  of,  say,  the 
advertising-sales  relationship.   Partly  as  a  consequence  of  this 
empiricism,  and  as  can  foe  seen  from  the  analysis  to  follow,  many 
of  the  model  choice  questions  will  have  to  be  resolved  on  fairly 
ambiguous  bases.   For  any  particular  correlation  pattern  found  in 
a  sample,  there  will  generally  be  many  alternative  models  that 
could  have  generated  the  data.   As  a  consequence,  a  good  deal  of 
the  model  choices  become  a  matter  of  art.   Because  of  this  it 
becomes  even  more  necessary  than  usual  to  present  the  statistical 
evidence  upon  which  the  choices  and  rejections  were  based. 

Since  the  pre-whitening  stage  (which  represents  the  first  step 
in  the  analysis)  contains  many  of  the  procedures  followed  in  the 
other  two  stages,  we  will  give  a  somewhat  undue  emphasis  to  that 
stage  in  our  presentation.   Also,  the  forecasting  aspect  is  neg- 
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lected  in  this  paper  although  this  might  often  be  the  aim  of  a 
typical  Box- Jenkins  analysis.   For  example,  the  transfer  function 
model  might  be  used  for  forecasting,  with  the  pre~whitening  process 
first  used  to  generate  the  future  input  values,  and  the  impulse 
response  function  plus  the  noise  process  applied  to  generate  the 
forecasts  of  the  output  series.   In  fact,  the  best  test  of  the 
effectiveness  of  the  advertising  inputs  might  be  to  forecast  sales 
on  the  basis  of  time  alone  —  using  a  univariate  approach  —  and 
then  do  the  same  with  the  transfer  function  model.   If  no  improve- 
ment occurs  with  the  latter,  the  advertising  might  possibly  have 
no  particular  power  to  influence  sales. 

2.   Pre-whitening  of  the  Advertising  Data 

The  aim  of  this  section  is  to  determine  the  appropriate  form 
in  which  the  advertising  series  should  be  used  for  the  analysis 
in  stages  two  and  three.   Ideally  the  advertising  should  have  been 
randomly  generated  to  avoid  problems  such  as  reverse  causality. 
If  the  decision  maker  can  engage  in  experimental  randomization, 
for  example,  in  a  local  test  market,  we  would  have  perfect  data 
with  which  to  test  the  effect  of  changes  in  advertising.   As  it  is, 
the  first  step  in  the  analysis  is  to  transform  the  actual  adver- 
tising history  to  a  random  sequence.   That  is,  we  use  transforma- 
tions in  order  to  eliminate  the  symptoms  of  nonrandomness  in  the 
independent  variable.   The  signs  of  nonrandomness  are  generally 
heteroscedasticity  of  variance,  lack  of  fixed  mean,  and  autocorre- 
lation. 
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A  visual  inspection  of  the  raw  advertising  data  in  Figure 
1(a)  and  (b)  does  not  indicate  h  teroscedasticity.   As  can  be 
seen,  the  later  advertising  values  do  not  exhibit  any  particularly 
large  swings  as  compared  to  the  early  ones. 

Whether  Figure  1  indicates  a  process  with  a  fixed  mean  or  not 
is  more  problematical.   Both  possibilities  were  entertained.   The 
sequence  of  first  differences  of  advertising  clearly  have  a  fixed 
(near  zero)  mean,  as  can  be  seen  from  Figure  2 (a)  and  (b) .   However, 
we  want  to  postpone  the  final  choice  of  d-G,  versus  d=l,  until  we 
have  considered  the  autocorrelations. 

The  autocorrelation  at  lag  k,  k=l,2#,.,  is  the  correlation 
between  realizations  of  the  process  separated  by  k  periods  (see 
Appendix) •   Significant  autocorrelations  in  Figure  3  imply  that 
there  is  a  dependency  between  successive  advertising  inputs  — 
thus,  advertising  cannot  be  seen  as  randomly  generated.   As  in  the 
case  of  modeling  the  univariate  noise  process  we  eliminate  this 
autocorrelation  by  identifying  the  p  and  the  q  of  the  autoregres- 
sive  moving  average  process  'which  transforms  advertising  to  a 
random  series. x   we  consider  what  kind  of  autocorrelation  would 
be  produced  by  each  kind  of  possible  p,  d,  q  process  and  then 
compare  this  pattern  of  theoretical  autocorrelations  to  the  sample 
autocorrelations.   Models  with  theoretical  patterns  distinctly 
different  from  the  observed  pattern  are  eliminated.   Cases  in  which 


xTo  preserve  the  relationship  with  the  output  (sales)  series, 
the  same  transformation  is  applied  to  that  series  as  well  (see 
below) . 
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Figure  1(a).   Lydia  pinkham  annual  advertising  expenditures  from 
1906  to  1935* 


gkapk  of  srrms 

aooo.ooooooon 

183o,oor,oor>co 

ltofcO.OOOCOOOO 

laqo.onofloojn 

1320.0  0000000 


, oooooooo 


-•JKfl.ooooooon 


81C.0OOO0O0O 


6«0, 00000000 


■U7P.OOOOO000 


iOO.UOOOOOOO 


*   - * 


«   • 

L —  *_... 


*  « 


...  * 


....  * 


>    >    «      * 


♦  ft  ♦♦♦♦♦+  ♦*  +  *♦♦*♦  +  *  ♦♦♦♦^•^♦♦^•♦♦♦♦♦♦♦♦♦♦♦^♦♦♦^♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦* 
5- 10  J5--     .  »0-~     -25  - 30  .  .....   3H         UG 


Figure  1(b).   Graph  of  the  Lydia  pinkham  annual  advertising 
expenditures. 
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(advertising  at  year  t  minus  advertising  at  year  t-1) 
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Figure  2(b).   Graph  of  the  first  differences  of  advertising 
expenditures. 
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Figure  3(b).   Autocorrelations  of  the  first  differences  of 
advertising. 
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the  empirical  autocorrelations  are  somewhat  similar  to  a  theoreti- 
cal pattern  (allowing  for  sampling  variations)  are  considered 
potential  candidates  for  further  investigation. 

In  addition  to  the  autocorrelations,  it  is  useful  also  to 
consider  the  partial  autocorrelations  before  deciding  upon  the 
transformation  (see  Appendix) „   The  partial  autocorrelation 
function  is  a  statistic  which  exploits  the  fact  that  autocorrela- 
tions at  lag  k  may  be  a  simple  recursive  function  of  autocorrela- 
tions at  lags  no  greater  than  k.   The  sample  partial  autocorrela- 
tion at  lag  k  is  an  estimate  of  the  k*"*1  autoregressive  coefficient 
in  the  candidate  process  (Figure  4(a)  and  (b) ) . 

Figure  5  shows  the  patterns  of  the  autocorrelation  function 
and  the  patterns  of  the  partial  autocorrelation  function  that  can 
be  derived  for  different  p,  d,  q  models.   The  graphs  of  the  sample 
autocorrelations  and  partial,  autocorrelations  of  the  advertising 
series  are  exhibited  in   Figures  6  through  9.   The  following  p,  d, 
q  models  can  be  considered  most  representative  of  the  advertising 
process. 

Alternative  1;  p-1,  d=-0,  q=0 

(1st  order  autoregressive) 

Supporting  Evidence:     The  autocorrelation  function  tails 

off  (more  linearly,  however,  than 
exponentially).   (See  Figure  6.) 
The  partial  autocorrelations  have  a 
major  spike  at  lag  i  (see  Figure  7) . 
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Figure  4(a).   The  partial  autocorrelations  of  the  advertising 
expenditures. 
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Figure  4(b).   The  partial,  autocorrelations  of  the  first  difference: 
of  advertising. 
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Refuting  Evidence; 


The  spikes  at  lags  3  and  5  can  be 
interpreted  as  caused  by  sampling 
error.   Note:   the  approximate 
standard  error  of  this  estimate  is 
for  each  partial  autocorrelation 
taken  separately  and  not  the  standard 
error  of  the  function.   Hence  the 
larger  the  number  of  lags,  the  greater 
the  possibility  of  partial  autocorre- 
lation exceeding  2  standard  deviations, 
The  relatively  slow  tapering  off  of 
the  autocorrelations  is  an  indication 
of  possible  non-stationarity  (compare 
the  models  below  where  d~l)  . 


Alternative  2; 


Supporting  Evidence 


p~2,  d=0,  q 

(2n°  order  autoregressive) 
The  full  span  of  the  autocorrelation 
function  (see  Figure  6)  looks  like  a 
damped  sine  function.   This  pattern 
is  expected  from  some  (2,0,0)  models. 
See  this  pattern  in  the  lower  right 
hand  area  of  the  triangle  in  Figure  10 
The  partial  autocorrelation  at  lag  2 
(see  Figure  7)  is  negative  as  expected 
(see  Figure  10) . 
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Alternative  3s 


Supporting  Evidence 


p=2,  d=l,  q=0 

(2n"  order  autoregressive,  with 
1st  differencing) 
The  autocorrelation  function  (see 
Figure  8)  has  a  distinct  damped  sine 
pattern  indicative  of  the  2nd  order 
autoregressive  model  (see  Figure  10) . 
The  partial  autocorrelations  (Figure 
9)  are  again  first  positive  and  second 
negative  as  expected  (see  Figure  10) . 
The  fourth  autoregressive  parameter 
is  large  and  positive  but  discounted 
due  to  the  strange  behavior  of  those 
time  periods,  four  years  apart, 
occurring  in  1926-27,  1930-31,  and 
1934-35. 


Alternative  4: 


Supporting  Evidence: 


p=l,  d=l,  q=l 

(mixed  1st  order  autoregressive, 
moving  average,  with  1st  differencing) 
The  partial  autocorrelation  (see  Figure 
9)  looks  like  a  damped  sine  function 
if  one  includes  the  positive  spike 
at  lag  4.   This  pattern  can  be  explained 
by  a  mixed  first  order  autoregressive 
and  first  order  moving  average  process 
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Figure  10. 


Typical  autocorrelation  and  partial  autocorrelation 
functions  p-^  and  (J)^   for  various  second  order 
autoregressive  models. 


-1.0  h 


»1 

-  '4 


1  \  *w 


"»  i  ♦** 


Vv* 


iP^"""*  p^ 


^ 


^  t<L 


**   .. 


—  r-Vv* 


-»*! 


1.0 


-1.0 


Figure  11.   Typical  autocorrelation  and  partial  autocorrelation 
functions  p  ,  and  (J^^  for  various  mixed  first  order 
autoregressive  -  first  order  moving  average  models. 
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(see  the  lower  right  hand  quadrant 
of  Figure  11) . 
The  final  choice  between  these  four  specifications  is  not 
easy  to  make  on  the  bat     f  the  autocorrelations  and  partial 
autocorrelations  alone,  siu.ee,  as  we  have  seen  all  four  versions 
can  be  supported.   In  !     a  case  it  is  useful  to  proceed  to  the 
estimation  of  all  four  alternatives  to  get  their  respective  0  and 

A 

<P  estimates  and  then  make  the  choice  of  the  basis  of  residual 

sum  of  squares,  significance  of  the  estimates,  and  other  customary 

statistics. 

The  estimation  of  &  and  <j>  is  done  using  a  maximum  likelihood 
method  with  a  non-linear  least  squares  algorithm  £2J .   The  estimates 
derived  for  the  four  models  are  presented  in  Figure  12.   Judging 
from  the  standard  errors  of  the  estimates  and  the  residual  sum  of 
squares,  the  specification  p=2,  d=l,  and  q-0  seems  the  best  of 
the  four.   Consequently,  this  became  our  chosen  pre-whitening 
transformation* 

If  we  have  succeeded  in  transforming  the  advertising  data 
to  a  random  series  then  the  series  shown  in  Figure  13  should  not 
be  autocorrelated.   Figures   14  and  15  show  the  autocorrelation 
and  partial  autocorrelation  functions  of  the  randomized  series. 


1 
The  results  in  this  section  of  the  paper  are  based  upon 
computer  runs  using  only  the  first  40  observations.   Similar  — 
in  fact  almost  identical  —  results  were  obtained  when  the  full 
b4  years  of  data  were  used.   In  the  transfer  function  analysis 
telow,  the  prewhitemng  parameters  &  and  f    were  based  upon  the 
tull  sample  runs.   The  values  estimated  for  the  two  parameters 
and  then  used  below  are  e  =0,  4>i   =  .074  and  $2   =  -.407. 
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Figure    13. 


Residuals  between  the  actual  advertising  expenditure 
series  and  the  series  of  values  generated  by  the 
(2,1,0)  pre-whitening  transformation. 
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Figure  14.   The  autocorrelation  function  of  the  residuals  of 
the  (2,1,0)  prewhitening  transformation. 
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A  Chi- square  statistic  which  tests  the  smallness  of  a  whole 
set  of  sample  autocorrelations  can  be  used  here  to  test  for  random- 
ness.  Figure  16  shows  the  value  of  the  Chi- square  statistic  for 
the  selected  pre-whitening,  and,  for  comparative  purposes,  also 
the  Chi-square  values  of  the  other  three  candidate  models.   Clearly 
(2,1,0)  emerges  with  substantially  less  autocorrelation  than  any 
of  the  other  models  considered.  X^   around  8  is  near  the  expected 
value.   Therefore  the  autocorrelation  is  near  the  amount  expected 
by  chance . 

3.   Deriving  the  Impulse  Response  Function 

On  the  basis  of  the  pre-whitening  function  derived  in  the 
previous  section,  we  will  now  proceed  to  the  second  stage  of  the 
analysis,  the  derivation  of  the  impulse  response  (or  transfer) 
function  itself.   Having  gone  into  great  detail  in  the  previous 
discussion,  we  are  now  in  a  position  to  treat  the  procedural 
questions  somewhat  more  superficially.   Again,  as  we  will  see, 
the  ultimate  model  choice  is  based  upon  rather  artful  considera- 
tions of  correlational  output,  in  this  case  the  cross-correlation 
function.   Another  similarity  with  the  previous  analysis  is  the 
use  of  an  identification  stage  —  here  determining  r,  s,  b  — 
and  an  estimation  stage  —  here  of  parameters  <-o  and  cf.   These 
quantities  were  defined  above  in  section  II. 

Even  before  the  identification  analysis  begins,  however, 
some  transformations  of  the  output  variable  (sales)  will  have  to 


(pfd,q)  Model  Q  statistic  degrees  of  freedom 

(1,0,0)  18.7  8 

(2,0,0)  18.5  7 

(2.1.0)  8.3  8 

(1.1.1)  19.2  8 

Figure  16.  Q  statistics  for  candidate  pre-whitening  transform- 
ations. The  Q  statistic  is  distributed  like  a  Chi- 
square. 
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be  made.   First,  as  mentioned  in  the  very  beginning,  there  is  a 
need  for  sales  to  be  stationary  so  that  some  equilibrium  level 
might  be  reached  for  a  fixed  advertising  level.   This  might 
necessitate  a  differencing  of  the  original  sales  series,  identi- 
cal to  the  difference    lone  in  the  univariate  pre-whitening 
approach.   The  graphs  of  the  original  sales  series  and  the  series 
differenced  once  appear  in  Figures  17  and  18.   As  can  be  seen,  the 
patterns  are  very  much  similar  to  the  advertising  data  discussed 
earlier.   Judging  from  these  patterns  and  the  autocorrelations 
in  Figures  19  and  20,  we  decided  to  take  first  differences  of 
sales  for  the  subsequent  analysis. 

It  should  be  pointed  out  that  with  the  given  pre-whitening 
transformation  of  the  advertising  series  and  with  the  present 
transformation  of  sales,  the  transfer  function  (1)  of  section  II. 
will  relate  the  first  differences  of  both  sales  and  advertising, 
rather  than  the  original  values.   Second,  since  the  pre-whitening 
transformation  applied  to  the  advertising  data  will  affect  the 
cross-correlations  between  the  two  series,  we  need  to  transform 
also  the  sales  data  with  the  same  p,  q  operators. 

In  the  preceding  section  we  found  the  pre-whitening  process 
to  have  p=2,  d=l,  q=0  and  so 

(•7)     «*t  =  (1-.074+.407B2)  (l-B)Xt 
where  Xt  is  the  original  advertising  series  and  (l-B)Xt  =  At  is 
a  differencing  needed  to  make  the  inputs  vary  about  a  fixed  mean. 
Applying  the  same  pre-whitening  transformation  in  (7)  to  the 
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original  sales  Yt  we  get 

(3)     Z3  t   ^    {l-*074-r.407B2}  (l-B)Yt. 

A  third  possible  sou     f  transformations  of  the  sales  data 
precedes  in  fact  the  two  given.   It  might  very  well  be  that  we 
want  to  impose  some  particular  functional  form  (say,  logarithmic) 
upon  the  sales-advertising  relationship  "because  of  prior  knowledge 
This  can  be  done  in  the  same  way  as  in  the  usual  econometric 
analyses,  i.e.  by  transforming  the  individual  variables  first  and 
then  relating  their  transformed  values  linearly.   If  such  a  trans- 
formation is  deemed  desirable,  it  should  be  carried  out  before 
any  of  the  analysis  discussed  so  far,  including  the  pre-whitening 
of  the  advertising  series.   In  the  present  case,  no  such  trans- 
formation of  the  data  was  made. 

After  these  transformations  have  been  completed,  the  identi- 
fication analysis  of  the  impulse  response  function  can  be  carried 
out.   The  requisite  r,  s,  and  b  parameters  can  be  chosen  on  the 
basis  of  the  cross-correlation  function  (see  Appendix)  between 
the  transformed  sales  and  advertising  series.   To  see  why  these 
cross-correlations  are  of  such  importance  —  they  do  in  fact 
determine  the  impulse  response  function  directly  —  the  following 
derivation  will  be  instructive. 

Recall  that  (l-B)Yt  ~  St„   We  replace  St  by  (3)  of  section  II 
and  multiply  by  ^t-k*   Taking  the  expected  value,  and  recognizing 
that  the  noise  Nt  does  not  covary  with  Xt,  we  have 

(9)      E^t-to^t)  =  E^^fvfBMt). 
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Now, 

(10)  E(<*t_k/3t}  =  E(^t-k(vo^t-vl°<t-l-v2C3<t-2*-')) 

(11)  =  VKE^t-k)2 

since  °^t's  are  constructed  to  be  uncorrelated  shocks  with  zero 
mean  and  variance  &? •  ^e  ^th  impulse  response  coefficient  is 
therefore  simply  related  to  the  covariance  of  ^t-k  an&  <&f  and 
thus  to  the  cross-correlation 

(12)  vK  =   cov^t-k^t) 

=  corr  («^t-k^t)  o^ 

Thus,  the  k*-"  coefficient  of  the  transfer  function  is  identical 
(except  for  a  scale  factor)  to  the  cross-correlation  between  the 
transformed  sales  series  and  the  pre-whitened  advertising  series 
lagged  k  periods. 

The  sample  cross-correlation  function  of  the  sales-advertising 
relationship  for  the  transformed  pinkham  data  is  presented  in 
Figure  21.   On  the  basis  of  this  graph  the  appropriate  values  of 
the  r,  s,  and  b  parameters  can  be  chosen  (see  Figure  22) ,  just  as 
in  the  case  of  the  p,  d,  and  q  parameters  of  the  univariate 
analysis.   Comparing  the  theoretical  patterns  in  the  table  with  the 
sample,  correlations  in  the  graph,  we  conclude  that  there  are  two 
candidate  r,  s,  and  b  model  specifications,  corresponding  to  two 
alternative  transfer  functions. 
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Figure  21(a).   Cross-correlation  function  between  the  pre-whitene 
advertising  and  sales  series. 


d 


°'^id  «*-0«C4   --• O.J441.    .   -6,1129 


<KuE  o.6«^  o.«0*o  o.lbl*        „o.L>l  ,1,,  *  8 

*0B3iJ2  -0,1994  -0.C961 


AG  10 

|LUE         -0.?275 


AlsflAHO    ERKHK    s  0,17& 

Figure  21(b).   The  impulse  response  function  (  the  transfer  function) 
between  the  prewhitened  advertising  and  sales  series. 
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Alternative  1: 


Supporting  Evidence: 


r=l,  s=0,  b=0 
(1st  order  autoregressive) 
The  first  three  cross-correlations 
decline  gradually,  almost  exponentially 
which  would  tend  to  argue  for  an  auto- 
regressive process.   The  later  cross- 
correlations  are  irregular,  some  even 
negative,  but  none  is  significantly 
different  from  zero.   This  type  of 
specification  is  a  Koyck  model  and 
close  to  some  of  the  Palda  versions 
run.   The  major  difference  is  that  in 
our  case  both  the  advertising  and  the 
sales  data  have  been  differenced  so 
as  to  achieve  stationarity  before 
being  correlated. 


Alternative  2: 


Supporting  Evidence 


r=0,  s=l,  b=0 
(1st  order  moving  average) 
Taking  notice  of  the  fact  that  the 
cross-correlation  at  t-2  is,  in  fact, 
insignificant  at  the  .05  level,  one 
can  argue  that  only  the  first  two 
correlations  should  be  considered. 
In  addition,  the  cross-correlations 
at  t-3  and  t-4  are  near  zero  so  there 
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does  not  seem  to  be  much  of  an  auto- 
r^gressive  effect  showing.   This  is 
close  to  the  stand  taken  by  Clarke 
and  McCann  in  their  spectral  analysis 
Again,  there  are  differences  due  to 
the  differing  treatment  of  the  sta- 
tionarity  requirement. 
Because  of  the  correspondence  between  the  models  here  and 
the  previous  research,  it  is  of  special  interest  to  see  if  the  Box- 
Jenkins  approach  can  resolve  the  conflict  as  to  the  number  of  per- 
iods over  which  advertising  has  substantial  effects.   Figure  2  3  is 
a  graph  of  the  respective  lag  coefficients  for  the  two  contending 
models  and  of  the  Box- Jenkins  impulse  response  coefficients.   The 
Box-Jenkins  coefficients  consist  simply  of  the  impulse  response 
coefficients  derived  directly  from  the  cross-correlations.   In  each 
case  the  lags  after  the  first  year  are  statistically  insignificant 
at  the  .05  level.   Note  however  that  the  values  of  the  0th  and  1st 
lag  coefficients  are  very  close  for  all  three  models.   The  major 
discrepancy  between  the  Palda  and  the  Clarke  and  McCann  models  is 
in  lags  2,  3,  and  4  and  higher  which  the  Palda  model  shows  as  posi- 
tive but  the  Clarke  and  McCann  data  show  to  be  near  zero.   Our 
results  show  the  2nd  to  be  positive  while  the  3rd  and  4th  are  near 
zero.   On  the  basis  of  the  cross-correlations  alone,  nothing  con- 
clusive can  clearly  be  said.   At  this  stage  it  is  obviously  a 
matter  of  rather  difficult  judgment  which  one  of  the  two  processes 
should  be  chosen,   a  final  choice  is  postponed  until  we  obtain 


• 


CM 


J 


n   • 


CA       & 


Palda' s 

Clarke- 

Box-Jenkins 

estimates 

McCann 

estimates 

based  on 

estimates 

based  on 

<f  =  .628 

based  on 

cross- 

Js 

cj   =.537 

FDR 

correlations 

0 

.537 

.642 

.649 

Coefficient 

1 

.337 

.297 

.408 

at 

2 

.211 

.051 

.152 

lag  k 

3 

.133 

.039 

-.062 

4 

.082 

-  ,o62 

.040 

5 

.052 

-.189 

-.314 

,u 


s 


,4- 


>> 


J 


Figure  23.     Graphs  of  the  palda 
transfer  function,  the  Clarke  and 
McCann  transfer  function  and  the 
Box-Jenkins  transfer  function 
estimated  from  the  cross-correla- 
tion of  the  pre-whitened  adver- 
tising and  sales  series. 
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preliminary  estimates  of  the  second  part  of  the  transfer  function 
model,  the  noise  process,  Nt. 

4.   The  Noise  process 

The  identification  and  estimation  of  the  noise  process  follows 
the  univariate  analysis  completely.   The  series  under  analysis 
consists  simply  of  the  residuals  of  the  fitted  transfer  function. 
Without  making  any  assumptions  about  the  structure  of  this  transfer 
function  we  can  calculate  residuals  directly  by 

(13)   Nt  =  Yt  -  v(B)Xt. 
Here  a  limited  order  of  v(B)  of  10  is  assumed.   The  residuals  were 
quite  random  as  can  be  seen  in  Figures  24  and  25.   The  Chi-square 
test  of  the  autocorrelations  showed  no  significant  autocorrelation 
at  the  .05  level.  On  the  basis  of  this  identification  procedure  we 

conclude  that  p,  d,  and  q  in  Equation  (6)  are  all  zero.   Therefore, 

we  simply  have 

Nt  =  £t 

where  £t  is  an  independently  and  identically  distributed  Normal 

random  variable. 

5.   The  Maximum  Likelihood  Estimates  of  the  Transfer  Functions 

As  in  the  pre-whitening  stage,  we  attempt  to  resolve  the  model 
choice  question  by  calculating  the  maximum  likelihood  estimates  of 
the  co  and  (f  parameters,  and  then  using  goodness-of-fit  and  diagnostic 
statistics.   Figure  26  shows  the  two  models  with  the  maximum  like- 
lihood estimates  and  the  upper  and  lower  95%  confidence  limits  of 
the  two  candidate  transfer  functions. 
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Figure  26. 


Maximum  likelihood  estimates  and  upper  and  lower 
confidence  limits  of  the  two  candidate  transfer 
function  models. 
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Deciding  which  of  the  two  models  should  be  selected  depends 
in  part  on  which  of  the  two  models  has  the  lower  residual  sum  of 
squares.   Figure  27  shows  that  model  (1,0,0)  fits  the  data  slightly 
better.   This  is  our  choice  of  the  transfer  function. 

Box- Jenkins  procedures  do  not  rely  solely  on  the  residual 
sum  of  squares  criterion.   Indeed  it  is  not  until  the  later  stages 
of  the  procedures  that  this  criterion  is  used.   In  the  identifica- 
tion stage  all  but  two  models  were  eliminated  on  the  basis  of  the 
dissimilarity  between  the  theoretical  cross-correlation  function 
expected  of  them  and  the  sample  pattern.   This  basis  for  eliminating 
other  models  was  irrespective  of  their  possible  low  residual  sum 
of  squares. 

6.   Diagnostic  Checks 

The  Box-Jenkins  procedures  do  not  stop  with  the  goodness-of- 
fit  criterion.   Checks  are  made  to  be  sure  that  the  residuals  of 
the  model  chosen  in  step  5.  are  not  autocorrelated.   were  they 
autocorrelated  this  would  be  a  turning  that  the  model  does  not 
conform  to  the  theoretical  pattern  expected  of  it.   The  residual 
series  of  (1,0,0)  shown  in  Figure  28  has  nonsignificant  autocorre- 
lation as  seen  in  Figure  27. 

A  second  diagnostic  check  is  to  verify  that  there  is  no  cross- 
correlation  between  the  pre-whitened  advertising  input  series  and 
the  residuals.   Were  there  significant  cross-correlation  this 
would  indicate  a  lack  of  independence  between  the  independent 
variable  and  the  error  term,  hence  confounding  the  effect  of  adver- 
tising and  the  unspecified  error  variable. 
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Throughout  this  analysis  this  concern,  a  concern  shared  in 
theory  by  all  regression  model->uilders,  has  been  stressed.   Figure 
29  shows  nonsignificant  cross-correlation.   Should  either  diagnostic 
check  have  been  significant.  Box  and  Jenkins   2   describe  further 
procedures  to  modify  the  model  appropriately. 

IV.   DISCUSSION 

After  having  gone  through  the  full  transfer  function  analysis 
of  Box- Jenkins,  we  support  the  Clarke  and  McCann  conclusion  that 
Lydia  pinkham  advertising  had  no  substantial  effects  past  one  year. 
This  conclusion  would  have  been  even  more  absolute  had  we  selected 
the  (0,1,0)  model  which  stipulates  absolutely  no  effect  past  one 
year.   The  (1,0,0)  model  which  we  marginally  preferred  over  the 
(0,1,0)  has  only  very  small  effects  past  one  year.   The  graph  in 
Figure  30  shows  the  convergence  of  both  models  to  the  Clarke  and 
McCann  estimates. 

It  should  be  remembered  the  t   there  are  in  general  a  large 
number  of  models  which  are  also  quite  reasonable  representations 
of  the  empirical  relation  between  advertising  and  sales,  but  which 
have  been  eliminated  by  the  procedures  outlined  in  this  paper.   In 
comparison  to  the  usual  regression  "data  mining"  exhibited  in  much 
research  —  and  also  present  in  Palda's  monograph  —  one  can  argue 
that  the  Box- Jenkins  analysis  provides  a  much  more  elegant  and 
efficient  method  for  screening  a  large  set  of  potential  models. 

Another  feature  of  the  present  type  of  analysis  is  the 
emphasis  it  places  upon  the  need  for  the  causal  input  to  be  truly 
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Figure    29.      The   cross-correlation   function  between  the 

residual    series   and   the   prewhitened   advertising 
input   series. 
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Figure  30.   Comparison  of  Box- Jenkins  transfer  functions  with 
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exogenous  and  the  role  of  experimentation.   One  might  even  argue 
that  since  the  decision  maker  has  a  need  to  understand  the  responses 
to  his  actions,  he  should  actively  interfere  with  the  system  on  a 
randomized  experimental  design  basis  so  as  to  eliminate  the  possi- 
ble confounding  influences  from  other  unspecified  variables.   This 
problem  becomes  especially  acute  in  the  case  of  advertising  studies 
based,  like  the  present  one,  on  historical  data.   In  such  cases 
the  pre-whitening  might  provide  a  safeguard  against  the  generation 
of  spurious  correlations  in  the  transfer  function. 

Although  the  Box-Jenkins  analysis  is  very  close  to  spectral 
analysis,  the  mathematics  is  developed  in  the  more  familiar  time 
domain  as  opposed  to  the  more  esoteric  frequency  domain  employed 
in  the  latter  analysis.   Also,  even  though  the  terminology  used 
in  a  Box- Jenkins  analysis  might  seem  strange  at  first,  not  much 
effort  is  required  to  obtain  a  rudimentary  working  knowledge  of 
the  procedures  involved.   In  addition,  the  use  of  correlation 
statistics  should  make  for  easier  adoption  than  the  inverse  Fourier 
transforms. 

Overall,  the  usefulness  of  a  Box- Jenkins  approach  should  be 
greatest  in  the  cases  where  a  relatively  long  and  continuing 
history  of  sales  and  advertising  data  is  available,  and  where  the 
sales  variable  is  relatively  uninfluenced  by  other  factors  con- 
trollable by  management.   The  introduction  of  several  input  var- 
iables is  difficult  because  of  computational  requirements  and  the 
difficulty  of  pre-whitening  several  input  series  independently  of 
each  other.   This  restriction  might  well  prove  to  be  a  severe 
obstacle  in  many  marketing  applications.   Overcoming  this 
restriction  will  be  the  object  of  further  research  in  building 
effectiveness  models  for  marketing  decision  makers. 


APPENDIX 
STATISTICS  USED  IN  BOX-JENKINS  ANALYSIS 

For  a  stationary  stochastic  process  {zt^  the  autocorrelation 
function  is  the  set  of  correlations  between  zt  and  z^-fk'  (k=0,l, 
2,  •••)  .   This  correlation,  pk,  called  the  autocorrelation  at  lag  k 
does  not  depend  on  t  because  this  is  the  definition  of  stationarity. 
The  sample  autocorrelation  function,  rk,  (k=Q, 1, 2, • *  * }  is  calcu- 
lated in  a  way  similar  to  any  sample  correlation. 

n-k  __  __ 

1      C     (2f2)  <zt+k-2> 
(16)  rk  =  n  J^l 


4 


J     (zt-z)z      j[l     C     (zt-z) 


srrr^T"  /T"^;  ,_  _%2 


The  standard  error  SD[rkj  has  been  estimated  by  Bartlett  to  be 
approximately  A/l/n  when  k=0  and  increasing  thereafter.   Rather 
than  estimate  rk  individually  for  each  lag,  it  is  often  convenient, 
when  diagnosing  residual  error,  for  example,  to  estimate  whether 
or  not  the  series,  _as  a   whole,  has  significant  autocorrelations. 
For  this  we  use  the  Q   statistic  (developed  by  Box  and  Pierce) 
which  is  distributed  like  a  Chi- square. 

The  inversion  of  a  stochastic  process  means  expressing  the 
error  as  a  function  of  the  observed  series 

(17)  c<t  =  7r(B)zt. 

Assuming  that  Tr^B)  is  a  k^  order  equation 

(18)  ^k(B)  -  l-*klB-^k2B2+.-^kkBk 

A 

we  can  get  an  estimate  ^k  ,  called  the  sample  partial  autocorrela- 
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tion,  by  solving  Yule-Walker  equations  L2J.   If  there  is  an  integer 

A 

p  beyond  which  4  k-,  are  insignificant,  that  is,  4>^  =0  for  k=p+l, 
p+2,#,#,  v;e  have  discovered  a  parsimonious  way  of  describing  the 


process.   The  SD^<f^Jis  approximately  *Jl/n, 

For  two  stationary  stochastic  processes  {f^^J   and  f/$ &  ,    the 

cross-correlation  function  is  the  set  of  correlations  between  "^t-k 

and  /?£'  k=0, 1, 2, * • • .   The  cross- correlation  at  lag  k  P  (k)  is 

estimated  by 

n-k 

as)    v<k)  =  \  Si    ^-^  Vj»v?)         . 

SD  C=K)    SD  {/3) 

A  <<v 

The  standard  error  of  r^  (k)  ,  SD(r^(k))  is  roughly  of  the  order 
of  Vl/n  .   Should  the  sample  cross-correlation  function  be  insig- 
nificant for  all  k,  one  infers  that  f°^.i  ^as  no  effec^  on^t>. 
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